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This work proposes a novel approach for digital image processing 
that relies on faulty computation to address some of the issues with 
discrete cosine transformation (DCT) compression. The proposed 
system has three processing stages: the first employs approximated 
DCT for picture compression to eliminate all compute demanding 
floating-point multiplication and to execute DCT processing with 
integer additions and, in certain cases, logical right / left 
modifications. The second level reduces the amount of data that must 
be processed (from the first level) by removing frequencies that 
cannot be perceived by human senses. Finally, in order to reduce 
power consumption and delay, the third stage employs erroneous 
circuit level adders for DCT computation. A collection of structured 
pictures is compressed for measurement using the suggested three- 
level method. Various figures of merit (such as energy consumption, 
delay, power-signal-to-noise-ratio, average-difference, and absolute- 
maximum-difference) are compared to current compression 
techniques; an error analysis is also carried out to substantiate the 
simulation findings. The results indicate significant gains in energy 
and time reduction while retaining acceptable accuracy levels for 
image processing applications. 


KEYWORDS: Approximate computing, DCT, inexact computing, 
image compression 


1. INTRODUCTION: 
TODAY’S amount of information. that is 
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other rapid DCT [1], [2] computing techniques have 


computational and power computing system usually 
process a significant intensive. Digital Signal 
Processing (DSP) systems are widely used to process 
image and video information, often under 
mobile/wireless environments. These DSP systems 
use image/video compression methods and 
algorithms. However, the demands of power and 
performance remain very stringent. Compression 
Methods are often used to ease such needs. 
Image/video compression techniques are classified 
into two types: lossless and lossy. 


The latter type is more hardware efficient, but at the 
sacrifice of ultimate decompressed image/video 
quality. The Joint Photographic Experts Group 
(JPEG) technique is the most extensively used lossy 
approach for image processing, while the Moving 
Picture Experts Group (MPEG) method is the most 
widely used lossy method for video processing. As 
the first processing stage, both standards use the 
Discrete Cosine Transform (DCT) algorithm. Many 


been developed for picture and video applications; 
however, since all of these algorithms still use 
floating point multiplications, they are 
computationally demanding and require substantial 
hardware resources. To solve these problems, many 
algorithms, such as [3], may have their coefficients 
scaled and approximated by integers, allowing 
floating-point multiplications to be substituted by 
integer multiplications [4], [5]. 


Because the resultant algorithms are substantially 
quicker than the original ones, they are widely 
employed in practical applications. As a result, the 
design of excellent DCT approximations for 
implementation by lower bus width and simpler 
arithmetic operations (such as shift and addition) has 
gained a lot of attention in recent years [6]. 
Image/video processing has the benefit of being 
extremely error-tolerant; human senses cannot 
typically detect decrease in performance, such as 
visual and audio information quality. As a result, 
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imprecise computing may be employed in many 
applications that allow some loss of accuracy and 
uncertainty, such as image/video processing [7, 8]. 
The introduction of inaccuracy at the circuit level in 
the DCT calculation is difficult and targets certain 
figures of merit (such as power dissipation, latency, 
and circuit complexity [9], [10], [11], [12], [13], 
[14]). This method is aimed towards low-power users. 
A logic/gate/transistor level redesign of an exact 
circuit is used to achieve process tolerance. A logic 
synthesis technique [9] has been described for 
designing circuits for implementing an inexact 
version of a given function using error rate (ER) as a 
parameter for error tolerance. Reducing the 
complexity of an adder circuit at the transistor level 
(for example, by truncating the circuits at the lowest 
bit positions) reduces power dissipation more than 
conventional low power design techniques [10]; in 
addition to the ER, new figures of merit for 
estimating the error in an inexact adder have been 
presented in [11]. This study introduces a novel 
framework for approximation DCT image 
compression, which is based on inexact computation 
and has three layers. Level 1 is a multiplier-less DCT 
transformation that uses just adds; Level 2 is high 
frequency component (coefficient) filtering; and 
Level 3 is computation utilizing inexact adders. Level 
1 has received much attention in the technical 
literature [16], [17], and [18]; Level 2 is an obvious 
strategy for reducing computing complexity while 
achieving just a minimal loss in picture compression. 
Level 3 employs a circuit level method to pursue 
inexact computation (albeit new and efficient inexact 
adder cells are utilized in this manuscript). As a 
result, the significance of this text may be found in 
the combined consequences of these three levels. The 
suggested framework has been thoroughly examined 
and appraised. For picture compression as an 
application of inexact computing, simulation and 
error analysis demonstrate remarkable consistency in 
findings. To prevent misunderstanding, the term 
"approximate" refers specifically to DCT methods, 
while the term "inexact" refers to circuits and designs 
that employ non-exact hardware to compute the DCT. 


2. REVIEW OF DCT For manuscript completeness, 
preliminaries to approximate DCT and a review 
of relevant topics are presented next. 


2.1. Discrete Cosine Transform (DCT) To obtain the 
ith and jth DCT transformed elements of an 
image block (represented by a matrix p of size 
N), the following equation is used: 


i eee, 
Di) =O 


N-1N-2 
y > = eae + 2 eas eee +i) i (4) 
i= j=0 

1 
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Where p(x.) is the x.vt0 element of the image. This 
equation calculates one entry (it) of the 


Transformed image from the pixel values of the 
original image matrix. For the commonly used $x 8 


Block for JPEG compression, N is equal to 8 and 
x and y from0 te 7. Therefore D{I,j) is also given 


By the following equation: 
1 
Dip = = cc) 


= p(x, ¥) cos pa 1) | cos ee - = (2) 


i= j=0 


For matrix calculations, the SCT matrix is obtained 
from the following: 
i _— 
ai = i] 


Ther (i,j) = (3) 
ber [eos (==. i>O 
4) . 2a 


So, DCT is computation intensive and may require 
floating-point operations for processing, Unless an 
approximate algorithm is utilized. 


2.2. Joint Photographic Experts Group (JPEG) 
The JPEG processing is first initiated by 
transforming an image to the frequency domain 
using the DCT; this separates images into parts 
of differing frequencies. Then, the quantization is 
performed such that frequencies of lesser 
importance are discarded. This reflects the 
capability of humans to be reasonably good at 
seeing small differences in brightness over a 
relatively large area, but they The precise degree 
of a quickly fluctuating brightness change is 
frequently indistinguishable. During _ this 
quantization stage, each frequency domain 
component is split by a constant and then 
rounded to the closest integer to compress it. As 
a consequence, many _ high frequency 
components have extremely tiny or probable zero 
values, at best negligible values. The picture is 
then recovered during the decompression 
process, which is carried out with just the 
necessary frequencies kept. The following 
actions must be taken before JPEG processing 
can begin: 
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1. An picture (in color or grayscale) is first 
segmented into kxk pixel blocks (k = 8 is typical). 


2. The DCT is then applied to each block from left 
to right and top to bottom. 


3. This produces kxk coefficients (so 64 for k = 8) 
which are quantized to minimize the magnitudes. 


4. The compressed picture, i.e., the stored or 
communicated image, is represented by the 
resultant array of compressed blocks. 5. To obtain 
the picture, decompress the compressed image 
(array of blocks) using Inverse DCT (IDCT). 


3. INEXACT ADDITION AND 
APPROXIMATE DCT 

Arithmetic circuits are particularly adapted to inexact 

computation; also, they have been widely studied in 

the technical literature. 


TABLE 1 
Approximate DCT Methods Applied to Image Compression; Number of Operations Required 
to Calculate the DCT for an 8x8 Block Size 


Method Additions Multiplications Shifts 


DFT by definition [1] 56° (432) 64 (192) 624 
DFT, Cooley-Tukey [1] 24° (68) (6) 64 
DCT by definition [2] 56 64 120 
Arai algorithm [3] 29 


SDCT [23] 24 
BASOS [25] 8 
BASO9 (26) 
BASII [27] witha = 0 
Multiplier-less BASI1 27] witha =1 
BASI1 [27] witha = 2 
CBI1 [28] 2 
BC12 [29] 
PEA12 [16 2 
PEA14 [17 


Total operations 


Multipliers 


5 
0 
0 
0 
0 
0 18 
0 
0 
0 
0 
0 


Literature and is an essential arithmetic operation in 
many inexact computer applications. A decrease in 
circuit complexity at the transistor level of an adder 
circuit frequently results in a significant reduction in 
power dissipation, which is often more than that 
provided by standard low power design approaches 
[10]. In [12], inexact adder designs were evaluated: 
inexact operation was introduced by either replacing 
the accurate cell of a modular adder design with a 
reduced circuit complexity approximation cell, or by 
changing the production and propagation of the carry 
in the addition process. Three novel inexact adder cell 
designs (denoted as InXA1, InXA2, and InXA3) are 
reported in [14]; these cells have both electrical and 
error properties that are particularly advantageous for 
approximation computation. These adder cells, as 
shown in Table 1, offer the following advantages over 
prior designs [10], [13]: I a limited number of 
transistors; (ii) a small number of erroneous outputs at 
the two outputs (Sum and Carry); and (iii) reduced 
switching capacitances (represented in Cgn gate 
capacitance of minimal size NMOS), resulting in a 
significant decrease in both delay and energy 
dissipation (Table 3). (and their product as combined 


metric). Table 1 displays metrics such as latency, 
energy wasted, and EDP (energy delay product) of 
inexact cells for both average and worst instances. 
InXA1 is the least average and lowest performing 
inexact cell. case InXA2 incurs delays in the least 
average and worst case power dissipations, as well as 
the least average EDP. Extensive modeling was used 
to establish the average and worst-case latency and 
energy dissipation of the adder cells. The delay for 
each input signal is recorded when the output reaches 
90% of its maximum value, whereas the energy 
wasted in all transistors is measured when the output 
reaches 90%. Based on these benefits, InXA1 and 
InXA2 based adders are being explored for the DCT 
application, which will be discussed more below. 


4. APPROXIMATE FRAMEWORKS 
PROPOSED 

This research introduces a novel picture compression 

framework with three layers of approximation, as 

seen below. 


Level 1 represents the multiplier-less DCT 
transformation, Level 2 represents high frequency 
filterig, and Level 3 represents inexact calculation. 
Levels | and 3 have already been discussed. Although 
high frequency filtering (Level 2) is not a novel idea, 
it is worth describing it for completeness' sake since it 
adds to the proposed framework's execution time and 
energy savings. As a consequence, rather than 
executing the quantization process on all resultant 
DCT transformation coefficients, the operation is 
only conducted on the set of coefficients for the 
changed block's low frequency components. 


4.1. High Frequency Filtration 

Filtering the high frequencies results in a picture that 
is hardly discernible by the human eye (as only 
sensitive to low frequency contents). 


This functionality allows you to compress a picture. 
As previously stated, a DCT changes the picture in 
the frequency domain such that the coefficients that 
represent the high frequency components (and so are 
not visible to the human eye) may be ignored while 
the other coefficients are retained. When used to 
picture compression applications, different amounts 
of retained coefficients are investigated; it has been 
proved that just 0.34-24.26 percent of 92112 DCT 
coefficients are adequate in high speed face 
recognition applications. 


Image compression using a supporting vector 
machine that only considers the top 8-16 coefficients, 
As suggested an image reconstruction approach based 
only on three coefficients. Evaluation and comparison 
of several picture compression algorithms using just 
ten coefficients, as described in [25]. 
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4.2. DCT Implementation Estimate 

> Unlike the approximate DCT techniques shown in 
Table 1, all needed computations (addition and 
subtraction) are then executed at the bit level 
using the associated logic functions. The length of 
all operators is 32 bits, and MATLAB simulates 
implementations using their Boolean logical 
functions. 


> Selected approximation DCT techniques are 
simulated for the Lena picture, and the results are 
presented in Fig. 1, where the Power Signal to 
Noise Ratio (PSNR) of all methods is plotted 
against the number of Retained Coefficients (RC) 
utilized in the compression's quantization step. 
The PSNR is derived as follows from the Mean 
Square Error (MSE):Mean Square Error (MSE): 


TaD Or “PFO 


xei y=. 


MSE = 


> Where p (j,k) is the image's correct pixel value at 
row x and column y, p (x,y) is the approximation 
value of the same pixel, and m and n are the 


image's dimensions (rows and _ columns 
respectively). Peak Signal to Noise Ratio (PSNR): 
(27 -— 1¥ | 


PSNR = 10 log ——— (6) 


Except for the non-orthogonal SDCT approach, the 
data reveal that compression utilizing CB11 delivers 
the best PSNR values. There are three categories of 
behavior noticed. Increasing output quality as the 
number of retained increases coefficients (RC). This 
By raising the RC for CB11, BASO8, BASO9, and 
BAS11 (a=0 and a= 1), a virtually constant PSNR is 
obtained. This happens with BC12 and PEAI4, 
resulting in a decrease in output quality as the RC 
increases. This happens with both BAS11 (a = 2) and 
PEA12. Two additional metrics, the Average 
Difference (AD) and the Maximum Absolute 
Difference (MAD), are employed to get a better 
understanding of the resultant quality (MD). These 
metrics are defined as follows: Average Difference 
(AD): 


AD = y ya — Bey) @) 
jel ket 


> Maximum Absolute Difference (MD): 
max a 
MD = |, Ley ~Peyl} @) 


Figs. 2 shows the resulting AD and MD for all 
methods; the average difference between the 
uncompressed and inexact-compressed images 
become smaller as RC increases except for BASI1(a 
=2) and PEA12 (further confirming the PSNR results 
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Fig1: Compression of an image using 
approximate DCT and bit-level exact computing. 
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Fig2: Maximum absolute difference (MD) for 
compression of an image using approximate 
DCT 
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image using inexact adders with different NAB 
values; (Number of Approximate bits) NAB =3 
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Fig4: Approximate DCT compression of an 
image using inexact adders with different NAB 
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Fig5: Approximate DCT compression of an 
image using inexact adders with different NAB 


36 


22 


values; NAB =5 


SNR Vs coefficients for different NAB values” 


* 


—-*—-CB11 
—-*--SDCT 
~*~ -BAS11[a=1]| J 
— +— -BAS11[a=0] 
BC,2 

¥ —# [=> -PEA-14 


— 


x- * 
¢ 
Riad 
_*K 
Fad 
pes =* + +4 
yf ere 
if > 


0 


5 10 15 20 25 30 35 


Number of retained coefficient 


Fig6: Approximate DCT compression of an 
image using inexact adders with different NAB 


values; NAB =6 


In Fig. 1). Fig. 2 shows that the MD between the 
uncompressed and _  inexact-compressed image 
pixels is reduced as more retained coefficients are 
used, the exceptions are PEAI2 and BAS11 (a =2). 
This further confirms the previous results. 
Fig. 4 depicts the compressed Lena image using the 
most accurate CB11 method for three RC values, 1.e., 
4, 10 and 16 retained coefficients. This figure also 
shows for comparison purpose the exact DCT 
compression results with RC = 16. 
4.3 Approximate DCT Using Inexact Computing 
Consider next the approximate DCT compression of 
Lena using inexact adders; as previously, the value of 
the NAB is increased from 3 to 6. The PSNR results 
are shown in Fig. 3,4,5,6 versus RC; the PSNR of the 
Compressed pictures (a measure of quality) are 
displayed by running all approximation DCT 
algorithms with just one inexact adder (for example, 
AMAI as the inexact adder in the top row). Each 
column depicts the compressed picture quality by 
running all approximation DCT algorithms with just 
inexact adders (for a NAB value). The leftmost 
column, for example, is for NAB = 3. The PSNR 
degrades as the NAB grows, as predicted (an 
acceptable level of PSNR is obtained at a NAB value 
of 4). 4.4 Truncation Truncation is one of the inexact 
computing strategies that may be used; truncation 
outcomes are also displayed. The employment of 
inexact adders yields more precise results (truncation 
is performed at values of 3 and 4 bits). 


5. CONCLUSION 

This research introduced a novel method for 
compressing pictures by employing the Discrete 
Cosine Transform (DCT) algorithm. The suggested 
method consists of a three-level structure in which a 
multiplier-less DCT transformation (containing just 
adds and shift operations) is performed first, followed 
by high frequency component (coefficient) filtering 
and calculation utilizing inexact adders. It has been 
shown that by employing 8 x 8 picture blocks, each 
level contributes to an approximation in the 
compression process while still producing a very 
good quality image at the end. The findings of this 
publication suggest that the combined impacts of 
these three levels are well known; simulation and 
error analysis have shown a remarkable agreement in 
results for picture compression as an application of 
inexact computing. 


Because the suggested framework has been shown to 
be successful for a DCT approach incorporating 
approximation at all three recommended levels, the 
following particular discoveries have been discovered 
and proven in this paper via simulation and analysis. 
analysis of errors When employing precise 16 bit 
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adders, CBI1 delivers the greatest quality 
compression (highest PSNR _ values) of any 
approximation DCT approach (Fig. 1). Other quality 
indicators for image manipulation (AD and MD) 
PSNR findings were verified. Figures 1 and 2 
Methods The next best approaches are BASO8, 
BAS11 with a=0 and BAS11 with a= 1. InXA2 has 
been determined to perform the best among the 
inexact adders studied [14]. When using inexact 
adders. Non-truncation based approaches give 
superior results than related truncation schemes when 
implementing approximation DCT JPEG 
compression, particularly when considering greater 
NABs. (Figures 3, 4, and 6) When diverse pictures 
are utilized, the DCT calculated with inexact adders 
produces consistent results. In general, NAB values 
up to 4 provide sufficient compression. Then it was 
shown that using bigger NAB values reduces the 
quality of the findings significantly. Utilizing four 
picture benchmarks, the BC12 and PEA 14 techniques 
require the least amount of execution time and energy 
to compress an image when compared to using an 
exact adder. When it comes to the greatest PSNR as a 
measure of picture quality, the approximate DCT 
technique CB11 delivers the highest value; 
nevertheless, when both execution time and energy 
savings are considered, the best approximate DCT 
method is BASO9. 
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